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Within the last fifteen years, network theory has been successfully applied both to natural sciences 
and to socioeconomic disciplines. In particular, bipartite networks have been recognized to provide a 
particularly insightful representation of many systems, ranging from mutualistic networks in ecology 
to trade networks in economy, whence the need of a pattern detection-oriented analysis in order to 
identify statistically-significant structural properties. Such an analysis rests upon the definition 
of suitable null models, i.e. upon the choice of the portion of network structure to be preserved 
while randomizing everything else. However, quite surprisingly, little work has been done so far 
to define null models for real bipartite networks. The aim of the present work is to fill this gap, 
extending a recently-proposed method to randomize monopartite networks to bipartite networks. 
While the proposed formalism is perfectly general, we apply our method to the binary, undirected, 
bipartite representation of the World Trade Web, comparing the observed values of a number of 
structural quantities of interest with the expected ones, calculated via our randomization procedure. 
Interestingly, the behavior of the World Trade Web in this new representation is strongly different 
from the monopartite analogue, showing highly non-trivial patterns of self-organization. 

PACS numbers: 89.75.Fb; 02.50.Tt; 89.65.Gh 


INTRODUCTION 

In the last fifteen years network science has ex¬ 
ploded, revealing a world composed by intercon¬ 
nected systems ubiquitously found both in natu¬ 
ral sciences and in socioeconomic disciplines [1-3]. 
Since the very beginning of network science, many 
different network representations have been adopted 
in order to study the particular system at hand [4]. 
However, the class of networks represented by bipar¬ 
tite networks has been recognized to provide a par¬ 
ticularly insightful representation of many different 
systems [5]: ecological networks [6], trade networks 
[7-9], citations and collaboration networks [10, 11] 
represent only few examples. 

One could thus expect a relevant amount of work 
aimed at identifying the statistically-relevant pat¬ 
terns observed in real bipartite networks, at least 
comparable to the mass of results obtained so far for 
monopartite networks [12-21]: however, quite sur¬ 
prisingly, little work has been done so far to imple¬ 
ment null models on real bipartite networks. Gen¬ 
erally speaking, null models are statistical models 
used to make inference on a real system on the ba¬ 
sis of partial information. The latter usually cor¬ 
responds to some observable property of interest as 
the number of trade partners of a country, its exports 
and imports, the total exposure of a bank, etc. In 
particular, null models for bipartite networks being 
real-data rooted and showing the desirable features 
of general applicability and analytical character are 
currently missing. More in detail, the algorithms 


proposed so far show several limitations, ranging 
from being purely numerical (thus lacking the an¬ 
alytical character) [6, 22, 23], to assuming an a pri¬ 
ori functional form either for the distribution of the 
quantities of interest [6] or for the model parameters 
(thus not being real data-rooted) [24] or, lastly, using 
approximate analytical models [25]. Moreover, al¬ 
most all the aforementioned approaches are tailored 
on ecological networks, thus lacking the character of 
general applicability. 

The lack of such models is, maybe, also due to 
the misconception that bipartite networks could be 
analysed by, firstly, projecting them on one of the 
layers and, secondly, analysing the projection with 
one of the models currently available for monopar¬ 
tite networks. As we will show in what follows, the 
monopartite and the bipartite representations en¬ 
close different kinds of information, irreducible to 
each other (in the most general case). 

The aim of the present paper is to fill this gap, 
proposing a theoretical framework guaranteeing the 
three aforementioned properties. In order to do this, 
we extend a recently-proposed method to random¬ 
ize monopartite networks [19] to bipartite networks. 
The method rests upon the sequential maximiza¬ 
tions of Shannon entropy and the network likelihood 
function, a combination which has been proven to be 
rather effective both for detecting patterns and to 
reconstruct the structure of several real-world net¬ 
works [20, 26-30]. To the best of our knowledge, the 
only other paper proposing a method satisfying the 
three requirements above is [31]: we will comment 
on the differences with the one proposed here in the 
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Discussion section. 

While the proposed formalism is perfectly general, 
in this paper we apply our method to the binary, 
undirected, bipartite representation of the World 
Trade Web (hereafter WTW). We focused on this 
particular system precisely because of its popular¬ 
ity among network scientists, who have applied null 
models to all its possible representations [26, 27, 32- 
35], with the exception of the bipartite one. As we 
will show in what follows, representing the WTW 
as a bipartite network allows to gain a substantially 
new insight into an already deeply explored system. 

The rest of the paper is organized as follows: Data 
section is devoted to the description of the dataset 
used for the present analysis, Methods section re¬ 
ports the detailed description of our method and 
Results section illustrates the results which are dis¬ 
cussed in Discussion section, where conclusions are 
also drawn. 


DATA 

The WTW can be represented in many different 
ways, depending on the level of information that we 
want to process. The most popular ones represent it 
via an adjacency matrix with nodes playing the role 
of world-countries and links indicating the presence 
of (any kind) of trade exchange between them. This 
framework has been recently extended to analyse the 
WTW as a multiplex, where trade exchanges corre¬ 
sponding to different commodities are distinguished 
[35, 49]. 

Here we represent the WTW as a bipartite net¬ 
work, i.e. by considering the set of world-countries 
and the set of products as different entities and link¬ 
ing a given country to a given product if (and only if) 
the former exports the latter above a certain thresh¬ 
old (the so-called RCA [8, 9]). Applying the latter 
rises the probability that the exported commodity is 
actually produced by the exporting country. In this 
representation, any two countries (as well as any two 
products) cannot be directly linked (i.e. links con¬ 
necting nodes of the same set are not allowed): thus, 
any two nodes of the same set can be still thought 
as “interacting” but only indirectly, via a connec¬ 
tion with the same nodes of the other family. This 
way of representing the WTW allows us to analyze 
the global economy from a different perspective, by 
making the productivity relations between countries 
explicit (i.e. which country produces which product ). 

The dataset we have considered for the present 
analysis is the NBER database, collecting data for 
the 38 years 1963-2000 [37] and categorizing prod¬ 
ucts according to the SITC revision 2 at four-digits 
level. Data have been further processed, building 
upon the data-mining procedure adopted in [38], to 
produce a dataset with 538 products across all years 
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FIG. 1. The binary, undirected, bipartite representation 
of the World Trade Web in the year 2000 [37]: countries 
are listed along the rows, products along the columns. 
Blue dots represent the ones, white dots represent the 
zeros. Rows and columns are reordered according to the 
algorithm introduced in [8, 9]. 


and a number of countries varying from 130 to 151. 


METHODS 

The distinction between countries and products 
leads naturally to the definition of an biadjacency 
matrix, which will be indicated with M. In the 
present paper we focus on the binary, undirected 
representation of the WTW: thus, the matrix en¬ 
tries will be either m cp = 1, indicating that country 
c exports an amount of product p above the RCA 
threshold, or m cp = 0, indicating that the produc¬ 
tion of p by country c is below the RCA threshold 
and, thus, has been ignored. As a consequence, each 
row represents the export basket of a given country, 
while each column represents the subset of produc¬ 
ers of a given product. A pictorial representation 
of the WTW biadjacency matrix in the year 2000 is 
shown in fig. 1, with the blue dots representing the 
ones and the white dots the zeros. 

If we indicate with C the total number of coun¬ 
tries and with P the total number of products, the 
total number of elements of the biadjacency matrix 
(i.e. its volume) is C • P, also representing the maxi¬ 
mum observable number of connections. In fact, un¬ 
like the usual square representation, the problems 
arising from the presence of self-connections are not 
encountered here. Moreover, the presence of two 
different subsets (also known as layers ) induces a 
measure of “rectangularity” of our matrix M [6], 
i.e. R = , ranging in R G [0,1), with values 

closer to 1 indicating a large asymmetry between the 
number of countries and the number of products and 
values closer to 0 indicating equivalence between the 
two layers cardinality (notice that the information 
on the sign would be based on the arbitrary choice 
of the layers ordering). 

The definitions of other topological quantities of 
interest easily follow from the usual ones, as the 
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number of links (i.e. the total number of connec¬ 
tions) 

c P 

L(M) = ^^m cp , (1) 

C— 1 p= 1 

and the connectance c(M) = L ^p , measuring the 
percentage of observed connections. Fundamental 
properties are represented by the number of node¬ 
specific connections, i.e. the degree of countries , also 
named diversification [7-9], measuring the number 
of products exported by each country 

p 

d c(M) = y]m C p, (2) 

P= 1 

and the degree of products, also named ubiquity [7-9], 
measuring the number of countries exporting each 
product 

c 

M m) = £ m cp . (3) 

C— 1 

Definitions 2 and 3 induce the notions of countries 
mean degree and products mean degree 

3(M)= E,4) 

= = , 5 ) 

The last passage follows from noticing that 

£(M) = Ec C =i d c ( M) = Ep=i «p(M). 

In order to make the connections between nodes 
of the same family explicit, a bipartite network can 
be projected on its layers, thus recovering two tradi¬ 
tional, monopartite representations. This operation 
can be straightforwardly implemented by consider¬ 
ing the matrix products 

C = M • M t v = M t • M (6) 

where M T is the transpose of the biadjacency ma- 
trix M. While the dimensions of M are C x P, the 
dimensions of its transpose are P x C. This im¬ 
plies that C results in a C x C matrix whose generic 
element C cc >, with c' , counts the number of pat¬ 
terns of length two between countries c and c' . The 
generic, diagonal element C cc is precisely the degree 
of country c. Similarly, V results in a P x P matrix 
whose generic element T pp >, with p ^ p' , counts the 
number of patterns of length two between products 
p and p' . As before, the generic, diagonal element 


V pp is the degree of product p. Remarkably, the 
entries of matrices C and V have a clear macroeco¬ 
nomic interpreation: while C cc ' counts the number of 
products shared by countries c and c', V pp > counts 
the number of countries exporting both products p 
and p '. 

Since nodes of the same layer cannot be directly 
linked, it is enough that a path of length two (i.e. the 
minimum allowed length) connects any two nodes of 
the same family to directly link them in the corre¬ 
sponding monopartite projection. Thus, by first ap¬ 
plying the Heaviside step-function ©[...] to matrices 
C and V element-wise (i.e. @[C] = {@[C cc /]}^ c , =l3 
where @[C cc /] can be 0 or 1, if C cc / = 0 and C cc > > 0 
respectively - and similarly for V) and then subtract¬ 
ing the diagonal elements, the binary, adjacency ma¬ 
trices describing the two monopartite projections are 
recovered, i.e. 

c = ©[c]-i c , p = e[v]-i P (7) 

where I c and Ip are the identity matrices having 
dimensions C x C and P x P respectively. 

Topological measures for binary, undirected, 
bipartite networks 

Several quantities have already been proposed to 
analyse bipartite networks [6]. However, here we 
define different measures by extending some of the 
most used indicators in network theory, better cap¬ 
turing, in our opinion, the particular features of a 
given bipartite network structure. 

a. Assortativity. The traditional definition of 
assortativity is intended to quantify the degrees cor¬ 
relations, by distinguishing the assort at ive behav¬ 
ior (signalling positive degrees correlations) from 
the disassortative behavior (signalling negative de¬ 
grees correlations). When dealing with bipartite net¬ 
works, we can measure such correlations both with 
respect to countries and with respect to products, 
by respectively defining the average nearest products 
ubiquity (or ANPU) 

/ i rnQpUp 

< n ( M ) = Z " p ~ 1 P P ( 8 ) 

a c 

and the average nearest countries diversification (or 
ANCD) as 

dp( M) = ^7=i mcp A (9) 

Up 

As in the monopartite case, assortativity is quanti¬ 
fied by respectively scattering the ANPU and ANCD 
values versus the degree sequences {d c }^ =1 and 

{ u p]p= i- 
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b. Complexity and fitness. As recently pointed 
out [8, 9], countries and products can be assigned 
two purely network-based quantities, known as fit¬ 
ness >, F c (to be assigned to countries), and complex¬ 
ity , Q p (to be assigned to products), playing the 
role of non-monetary indicators of the economy de¬ 
velopment and providing a highly non-trivial way to 
rank the world-countries economic health (see also 
the Supplementary Information). 

c. Motifs. The usual clustering coefficient, 
measuring the hierarchical structure of a monopar- 
tite network, cannot be defined for bipartite net¬ 
works: in fact, since no odd cycles of any length can 
be observed in bipartite networks (precisely because 
links within the same layer are forbidden) triangles 
cannot be observed as well; similarly, the usual tri¬ 
angular motifs cannot be defined [3, 39]. 

However, higher-order correlations between nodes 
can still be captured by defining a completely new 
class of motifs. The first examples we provide are 
the V -motifs and the A -motifs (see fig. 2). The for¬ 
mer count how many couples of countries export the 
same products, quantifying the productivities’ simi¬ 
larity; the latter count how many couples of products 
are in the basket of the same producer, providing a 
measure of products correlation. Remembering that 
C cc /, with c/c', counts the number of products ex¬ 
ported by both c and c', the total number of V-motifs 
connecting any pair of countries is 

c c C C P 

n v (m) = y; Y C cc = F Y Y mc P mc 'p 

C— 1 c'=c+1 C— 1 c'=c- \-l p= 1 

=£(';') w 

p =1 v 7 



FIG. 2. Motifs for bipartite networks. Countries are re¬ 
ported in the upper layer, products in the bottom layer. 
The bottom panel shows motifs belonging to the Vn and 
An families, with n — 2,3. 


fig. 2 shows an example of V3-motifs and A3-motifs. 
From defintions (12) it follows that V1=A1 = L. 

Higher-order correlations can be captured by al¬ 
lowing for a higher number of connected nodes in the 
same layers (see X-motifs , M -motifs and W -motifs 
in the Supplementary Information). Remarkably, all 
the defined kinds of motifs: 

• can be compactly expressed in terms of prod¬ 
ucts of biadjacency matrix entries; 


and, remembering the analogous role of V pp ', the to¬ 
tal number of A-motifs connecting any pair of prod¬ 
ucts is 

p p p p c 

TVa(M) = ^ 'Ppp' = ^ pncpVficp' 

p=lp'=p+l p=l p'=p+l c=l 



The last passages follow from noticing that each 
V-motif (A-motif) is constituted by a pair of links 
having the same product (country) as a common ver¬ 
tex. The number of countries competing on the same 
product, as well as the number of products in the 
same basket, can be further risen, leading to the fol¬ 
lowing generalizations (with V2 =V and A2 = A): 

p c 

Nvn{ M) = £ (M, N An (M) = £ ( d fij ; (12) 

p= 1 ' 7 C=1 ' 7 


• can be defined for specific subsets of countries 
and products, thus allowing for a finer analy¬ 
sis of the production dynamics. For example, 
a measure of correlation of countries a and b 
production is given by the motif N V a,b = C a b = 

1 W j ap^bp] 

• may have an application also in the analysis 
of ecological networks, especially mutualistic 
networks (e.g. impollinators-flowers): in fact, 
measures of co-occurrence can be directly ap¬ 
plied to ecosystems to quantify the species’ 
competitiveness for the available resources. 

In what follows we will focus on the Vn and An 
families (a more detailed discussion about all motifs 
is provided in the Supplementary Information). 

d. Assortativity coefficient. Beside our defini- 
tons, we have also considered the assortativity mea¬ 
sure proposed in [40] and called r. The latter ranges 
in the domain r E [—1,1], with r = 1 indicating the 
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tendency of links to connect nodes with similar de¬ 
grees and r = — 1 indicating the tendency of links to 
connect nodes with different degrees. 

e. Nestedness. On the basis of the two afore¬ 
mentioned measures F c and one can reorder the 
matrix rows and columns (i.e. countries and prod¬ 
ucts) by, respectively, decreasing the fitness along 
rows (from top to bottom) and increasing the com¬ 
plexity along columns (from left to right), thus ob¬ 
taining the triangular structure shown in fig. 7. In 
order to quantify the shape of such a matrix, several 
measures have been recently proposed [41-44], un¬ 
der the common name of nestedness. Here we adopt 
the one proposed in [41] (called NODF - see also the 
Supplementary Information). Notice that the mea¬ 
sure of nestedness adopted here doesn’t depend on 
the rows and columns ordering criterion (in what 
follows we will adopt the one based on F c and Q p 
measures) [8, 9]. 


Randomizing bipartite networks 

In order to implement suitable null models to de¬ 
tect the statistically-relevant patterns of real bipar¬ 
tite networks, the lines of the method proposed in 
[19] can be followed. In particular, an ensemble Q of 
binary, undirected, bipartite networks must be con¬ 
sidered, in order to maximize Shannon entropy 

s=-J 2 p ( M ) lnp ( M ) ( 13 ) 

meg 

under a given set of constraints C( M) [16, 19]. No¬ 
tice that the probability coefficient P(M) is assigned 
to every adjacency matrices in the esemble and the 
constraints are defined in terms of the entries of M. 
The result is the well-known exponential distribu¬ 
tion: 


P(M|0) = -^— (14) 

Z(0) 

with the hamiltonian P(M, 0) = 6-C( M) compactly 
expressing the imposed set of constraints, 0 being 
the vector of Lagrange multipliers associated to the 
vector of constraints and Z{6) = e _ ^ M ’ e ) 

being the normalization. 

In the monopartite case, one of the most insight¬ 
ful null models has been proven to be the so-called 
Configuration Model (CM) [12, 14]. Let us now im¬ 
plement the bipartite extension of the CM (BiCM, in 
what follows), by constraining the degree sequence 
of the binary, undirected, bipartite WTW and ana¬ 
lyzing the system beyond the information contained 
into it. Since now we have two different layers of 
nodes, the hamiltonian reads 


H( M, 6) = a- d(M) + j3 ■ u( M). (15) 

Now we can calculate the probability coefficient 
(14), associating a probability to each network in 
the ensemble on the basis of the specific degree se¬ 
quences d( M) and u( M): 


P(M|<9) = 


g—a-cf(M)—/3-u(M) 
e -a-d(M)-p-u(M) 


(16) 


the notation H C)P being equivalent to U^ =1 Hp =1 
(see the Supplementary Information for the detailed 
calculations). The coefficient p cp = , with 

e -a. c _ Xc an q e -/3 p = y pj i s the ensemble probabil¬ 
ity of having a link between country c and product p, 
as (m cp ) = Eiviea rn cp (M)P(M\(f) = p cp = f+ c ** yp • 

Our null model provides the analytical expression 
of a network probability as a product over all the 
accessible C x P pairs of nodes. In other words, the 
BiCM interprets the links as independent random 
variables, thus defining a grandcanonical probabil¬ 
ity measure where links correlations are discarded. 
Notice also that no probability coefficients control¬ 
ling for the presence of links between nodes in the 
same layer appear in the expression (16). This is 
a consequence of having considered an ensemble of 
bipartite networks as the support of our probability 
distribution: in so doing, the forbidden intra-layer 
links are automatically excluded by the choice of the 
allowable configurations volume. 

The probability distribution in (16) depends on 
C + P unknown parameters (i.e. the Lagrange mul¬ 
tipliers), also called hidden variables [13, 24]. The 
recipe provided by statistical mechanics to estimate 
the hidden variables is summed up by the equations 


<91nZ 

da c 


{'d c ), Vc; - 


dlnZ 

d/3 p 


(u p ),\/p. (17) 


However, no indication about the numerical value 
to be assigned to the ensemble average of con¬ 
straints is provided. Thus, in order to estimate 
the hidden variables from data, let us first note 
that P(M|0) can be rewritten solely in terms of 
the observed constraints value, i.e. P(M|0) = 

n c ^ (M) n P </; r(M, n c , P (i +^)- 1 Then, 

let us consider the log-likelihood function £(x, y) = 
lnP(M|x, y ): 
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c P 

C(x, y) = ^d c (M)lnx c + ^u p (M)lny p + 

c=l p=l 

C P 

-EE ln(l + x c y p ). (18) 

C— 1 p=l 


(X) ~ X = X(M)P(M), (20) 

Meg 

a x ^ax= ^(X(M)-X) 2 P(M) 
mg^ 

( 21 ) 


The recipe provided by statistics to estimate the 
unknown parameters of a given probability distri¬ 
bution prescribes to maximize C [19]. This means 
solving the system V£(x, y) = 0 of C -\-P equations 
in C + P unknowns [19]: 


d c { M) 
u p (M) 


E P x c y p 

P= 1 1 +x c y p 

E C x c y p 

c —1 1-| -x c y p 


, c=l...C, 
,p=l...P. 


(19) 


In what follows the vector of solutions satisfying 
the system (19), for given d( M) and f?(M) as degree 
mean values, will be indicated as (#*, y*). Notice 
that the coefficients appearing at the second mem¬ 
ber of the system equations have the same functional 
form both for countries and products. This is a con¬ 
sequence of assigning only one Lagrange multiplier 
to each node but in such a way to distinguish the 
nodes in the first layer from the nodes in the second 
layer. 


Expected topological measures for binary, 
undirected, bipartite networks 

In the previous subsections several quantities of 
interest to be measured on binary, undirected, bi¬ 
partite networks have been listed. In this subsection 
we will show how our method can be implemented to 
calculate their expected value (to be compared with 
the observed one) and the relative errors (to quan¬ 
tify the discrepancies) in order to assess up to what 
level our null model is able to explain the higher- 
order structure of the network. 

Our method allows us to proceed in a two-fold 
way. The first one is analytical. Using the link- 
specific probability coefficients p cp and the passages 
sketched in [19], we are able to analytically calculate 
both the expected value and the standard deviation 
of the (analytically-definable) quantities of the pre¬ 
vious subsections. However, because of the impossi¬ 
bility to perform analytical evaluation of the average 
for some key quantities, we have adopted a different 
strategy: we have sampled the grancanonical ensem¬ 
ble of binary, undirected, bipartite networks induced 
by the BiCM according to the probability coefficients 
P(M|x*, y*), measured the aforementioned proper¬ 
ties on our sample Q and calculated the statistical 
moments, as average and standard deviation, of the 
generic quantity X as 


i.e. as sampling moments according to the sampling 
frequencies P(M) = (N m being the number of 

networks in the ensemble having biadjacency ma¬ 
trix equal to m). Since our method is unbiased 
[19, 21], numerically sampling Q provides a faith¬ 
ful representation of the whole ensemble. We have 
also calculated the probability distribution (induced 
by P(M)) of some of the properties of interest, in 
order to quantify the statistical significance of their 
observed value (via the z-score, for example). 

Nevertheless, the analytical expressions of the ex¬ 
pected value and standard deviation of the quanti¬ 
ties explicitly defined in the previous subsections has 
been derived in the Supplementary Information. 


RESULTS 

Let us first show our results on the temporal 
snapshot of the WTW corresponding to the year 
2000. The number of nodes is C 2000 = 151 and 
P 2000 = 538, causing the R index to be R ~ 0.56 
(see section Methods). The high asymmetry of our 
network is also pointed out by the different mean 
degrees, d ~ 70 and u ~ 20, indicating that coun¬ 
tries are, on average, almost three times more con¬ 
nected than products. However, the connectance 
is C 2000 — 0.13: thus, our bipartite WTW is much 
sparser than its monopartite counterpart [26]. No¬ 
tice that our null model, constraining (on average) 
the degree sequence, exactly reproduces any net¬ 
work’s connectance by definition, spanning the do¬ 
main of applicability of both the sparse and the 
dense network reconstruction algorithms. 


Assortativity 

Fig. 3 shows the comparison between observed 
and expected values of our coefficients of assortativ¬ 
ity. Having plotted u™ n VS d c and d p n VS u p , we 
firstly observe that the bipartite WTW shows a dis- 
assortative behavior, signalled by a globally decreas¬ 
ing trend of our measures. More detailedly, two dis¬ 
tinct behaviors seem to characterize u™ n as a func¬ 
tion of d c : while countries with low diversification 
are preferentially linked to products with high ubiq¬ 
uity (left side of panels 3a and 3b), countries with 
high diversification are linked to almost all products 
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FIG. 3. Application of our method to the binary, undirected, bipartite World Trade Web in the year 1963 (left 
column) and 2000 (right column). Panels report u™ n VS d c (a, b) and d p n VS u p (c, d). Observed points are 
in blue; the black, solid curves are CM-induced ensemble averages; the red, solid lines are RG-induced ensemble 
averages; the gray, dashed curves indicate the ±1 standard deviation region; the gray, dash-dotted curves indicate 
the ±2 standard deviations region. Colored areas represent the ensemble density of expected points (sampling 5000 
matrices). Although the BiCM captures the disassortative trend of the WTW, its striking similarity with the BiRG 
predictions proves that the explanatory power of the degree sequence is far more limited in the bipartite representation 
than in the monopartite one [26]. 


(right side of panels 3a and 3b). This is also re¬ 
flected in the triangular structure of the matrix (see 
fig. 1). For products, this distinction is less sharp 
(panels 3c and 3d): in fact, while high-ubiquity prod¬ 
ucts are linked to almost all countries , low-ubiquity 
products can be found connected to both high- and 
low-diversification countries. 

As can be seen from fig. 3, the BiCM captures the 
disassortative behavior of both u and how¬ 
ever, only part of the observed points lies within 
the ±2 standard deviations region. This means that 
the mechanism shaping the disassortative behavior 
of the WTW is not completely explained by our null 
model, signalling a non-trivial origin of the WTW 
degree correlations. What is strikingly surprising is 
the prediction based on the Random Graph model 
(BiRG): the corresponding trend is closer to the 


BiCM prediction than in the monopartite represen¬ 
tation of the WTW [26]. Moreover, since disassorta- 
tivity is more pronounced in real data, our results in¬ 
dicate that the BiCM performs better than BiRG for 
small values of d c and u p , while the BiRG correctly 
capture their flat behavior at large d c and u p (i.e. 
for competitive countries and ubiquitous products, 
for which (<% n ) BiRG ~ L/C, ( u™) Bi RG - L/P ). 
This seems to indicate that the explanatory power of 
the degree sequence is far more limited in the bipar¬ 
tite representation than in the monopartite one and 
that additional information is required to improve 
the agreement between observations and predictions 
(even at the simplest level of binary, undirected net¬ 
works) . 

Fig. 4 extends our assortativity analysis to the 
entire dataset. In order to condensate the infor- 
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FIG. 4. Temporal evolution of the arithmetic mean of the observed {u™ n }^= 1 (•) and expected {{u™ n )}c= 1 (•), 
together with the 95% Cl (panel a); temporal evolution of the arithmetic mean of the observed {dp n }p =1 (•) and 
expected {( dp U )}p =1 (•), together with the 95% Cl (panel c); temporal evolution of the Pearson correlation coefficient 
between {uc U }c=i and {(^c n )}c=i (A) together with the 95% Cl (panel b) and between {dp U }p =1 and {(dp U )}p =1 (■) 
together with the 95% Cl (panel d). The evolution of expected points closely follows the evolution of the observed 
ones, pointing out that the BiCM correctly describes the temporal trend of the assortativity indices. 


mation of 38 scatter plots, we have computed the 
barycenter and sparseness of both the observed and 
expected clouds of points. In particular, we have 
calculated the arithmetic mean of both the observed 
values {< n }^ = i 


= T,ti< 


cc r 

_ 1 V - '' cc' 


c 


( 22 ) 


c= 1 c' = 1 


and { dp n }p =1 , the expected values {{u^ n )}^ =1 and 
{(dp n )}p= l an d the corresponding confidence inter¬ 
vals (Cl) at 95% level. As for the motifs, also u nn 
and d nn can be interpreted in macroeconomic terms. 
In fact, Ec'=iCcc'/d c measures the country-specific 
number of competitions, thus quantifying the (av¬ 
erage) presence of a country on the global market. 
Further averaging over all countries provides a mea¬ 
sure of the integration of world-countries production. 

What emerges is that the evolution of expected 
points closely follows the evolution of the observed 


ones, pointing out that the BiCM correctly describes 
the temporal trend of the assortativity measures. 
Notice that, even if observed points are systemat¬ 
ically more concentrated on higher levels (as shown 
in panels 4a and 4c) , the confidence intervals are still 
close enough to let us interpret the BiCM predictions 
as correct. Moreover, the constancy of the amplitude 
of the confidence intervals for both observed and ex¬ 
pected ANPU values indicates that the correspond¬ 
ing clouds of points maintain the same sparseness 
across our 38 years dataset; on the other hand, the 
amplitude of the observed ANCD confidence inter¬ 
vals slightly reduces, indicating a shrinkage of the 
corresponding cloud of points (compare panels 3b 
and 3d). 

The temporal trends of u nn and d nn show inter¬ 
esting differences. In fact, while u nn keeps increas¬ 
ing across the whole dataset, d nn does not (and 
from 1975 starts decreasing). Since the countries 
mean degree keeps rising as well (di 963 — 48 and 
<^2000 — 70), the increasing trend is probably due to 
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FIG. 5. Application of our method to the binary, undirected, bipartite World Trade Web in the year 1963 (left 
column) and 2000 (right column). Panels report u p VS Q p (a, b) and d c VS F c (c, d). Observed points are in 
blue; the black solid curves are BiCM-induced ensemble averages; the gray dashed curves indicate the ±1 standard 
deviation region; the gray dash-dotted curves indicate the ±2 standard deviations region. Colored areas represent 
the ensemble density of expected points (sampling 5000 matrices). Our null model seems to satisfactorily capture 
both trends. Panels (e, f) show the so-called “poverty trap”, i.e. the group of countries with lowest fitness [8, 9]. 
Notice how all such countries lie within the ±2 standard deviation region (or immediately outside). 


the birth of new links, indicating that while existing 
countries have enlarged their production, new-born 
countries have started theirs. The results seem also 


to be compatible with the picture of several “appeal¬ 
ing” products behaving as hubs and attracting links, 
including the ones of the new-born countries which 
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in turn, having a low degree, reduce the value of the 

jnn 
u p . 

Since u nn ranges in the interval [0,(7], the ef¬ 
fect due to the varying number of countries can be 
washed away by further dividing it by (7, Gc = 
u nn /C (and thus normalizing it to the interval [0,1]). 
Remarkably, our index Gc can be now interpreted a 
“genuine” measure of globalization, not affected by 
any spurious effect. Very interestingly, the tempo¬ 
ral trend of Gc after 1970 becomes now almost flat. 
This means that the WTW evolution does not ac¬ 
tually affect the value of countries integration which 
organize in such a way to maintain the same value 
of Gc, irrespectively of the rising number of coun¬ 
tries, their higher diversification, etc. This seems to 
confirm the stationary evolution of such network, re¬ 
cently pointed out [48] . A similar reasoning leads us 
to interpret Gp — d nn /P as a measure of products 
homogeneity. 

We have also calculated the Pearson correla¬ 
tion coefficient between the vectors 1 and 

{( u c n )}c=i (P ane l 4b) and between the vectors 
{dp n }p = i and {(dp n )}p =1 (panel 4d), in order to 
quantify the agreement on the “shape” of the clouds 
of points. The correlation of the latter is lower than 
the correlation of the former: this is due to the shape 
of the empirical cloud of ANCD which is less linear 
than the empirical ANPU, thus worsening the agree¬ 
ment with the corresponding expectations (which 
show an almost perfectly linear trend). 


Complexity and fitness 


considerations hold for all the remaining years, indi¬ 
cating a constant performance of our method across 
our 38-years dataset. 

The average trends in fig. 5 are computed dif¬ 
ferently from those in fig. 3: while the latter repre¬ 
sent the node-specific, ensemble averages {(d^ n )}^ = 1 
and {(Up n )}p = i, the former represent averages taken 
over ranked nodes, ordered according to their com¬ 
plexity - panels a and b - and fitness - panels c and 
d. Generally speaking, ordering nodes on the ba¬ 
sis of such procedure will produce a different rank¬ 
ing for different bipartite networks of the ensemble. 
Moreover, the ranking operation guarantees neither 
that the identity of ranked nodes remains the same 
(e.g. two different countries can be ranked first for 
two different networks), nor that the corresponding 
complexity and fitness maintain their value across 
our sample (i.e. the nodes ranked first will, in gen¬ 
eral, have different values of F c and Q p ): this in turn 
implies that each ranked node degree may change as 
well (i.e. the nodes ranked first for different networks 
will, in general, have different degrees). From these 
considerations, the need of quantifying 1) the varia¬ 
tion of any country diversification as a function of its 
fitness and 2) the variation of any product ubiquity 
as a function of its complexity follows. This is in line 
with the spirit of the research in [8, 9]: trying to es¬ 
tablish a biunivocal relation both between ubiquity 
and complexity and between fitness and diversifica¬ 
tion, in order to unambiguously rank countries and 
products. This kind of analysis represents a highly 
non-trivial test bench of our model which appear to 
perform very well. 


Complexity and fitness can be obtained only nu¬ 
merically, as the result of the convergence of the al¬ 
gorithm proposed in [8, 9, 45]. Panels 5a and 5b 
show the comparison between observed and expected 
complexity (plotted VS ubiquity) for the years 1963 
and 2000; panels 5c and 5d show the comparison 
between observed and expected fitness (plotted VS 
diversification) for the same years. Our null model 
capture both trends with a larger accuracy than in 
the measure of assort at ivity: notice how the ex¬ 
pected trend under the BiCM reproduces the “beak” 
of the observed complexity in real data and the vast 
majority of the observed cloud lies within the ±2 
standard deviations region. 

Similarly, the expected trend of reconstructed fit¬ 
ness captures the different growth regimes of the ob¬ 
served fitness in the WTW data, showing few sparse 
points outside the same error region (clearly visible 
in the log-log plots of fig. 5). The regime with lower 
slope (left side of panels 5e and 5f) represents the so- 
called “poverty trap” [8, 9], i.e. the area populated 
by the group of countries with lowest fitness: notice 
how all such countries lie within the ±2 standard 
deviation region (or immediately outside). Similar 


Motifs 

The motifs analysis has been carried on by cal¬ 
culating two different quantities. The first one has 
been defined as 


Sm 


iV m (M) - (N m ) 
(N m ) 


(23) 


and named similarity : it quantifies the goodness of 
our prediction, measuring the difference between the 
observed and expected abundances. Beside similar¬ 
ity, we have also considered the traditional ^-scores 
[3, 28, 39], defined as the ratio of the difference be¬ 
tween the observed and expected abundances and 
the corresponding standard deviation 


Z'rn 


Nrnj M) - {N m ) 


(24) 


with <r m = y (N^) — (N m )* and m indicating the 
particular motif considered. Even if z-scores have 
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been recognized to be dependent on the network 
size [46] (at least for monopartite networks), our 
dataset collects matrices with very similar volume 
(.R G [0.56,0.61]): thus, we can imagine this effect 
to be very small. 

Notice that similarity and ^-scores provide com¬ 
plementary information: in particular, the latter 
measures the statistical significance of the agree¬ 
ment found by the former, accounting for the role 
of higher-order correlations not included in our con¬ 
straints. Moreover, their ratio Sm/zm = cr rn /(N rn ) 
coincides with the motif-specific coefficient of vari¬ 
ation, quantifying to what extent the average sums 
up the relevant information encoded into the cor¬ 
responding ensemble distribution. Naturally, as for 
the observed abundances, both s m and z m can be 
defined for specific subsets of nodes as well. 

Fig. 6 shows the analysis of the Vn and An mo¬ 
tifs. First, we have sampled the V-motifs and A- 
motifs abundance on the ensemble, in order to ver¬ 
ify their distribution (see the Supplementary Infor¬ 
mation): both follow a gaussian very closely. Since 
all our motifs are sums of (neither independent nor 
identically distributed) random variables, this may 
be seen as a consequence of the generalized Cen¬ 
tral Limit Theorem, z-scores can be thus attributed 
the correct probabilistic meaning of (gaussian) stan¬ 
dardized variables [39, 46] and choosing a thresh¬ 
old zo for z allows the identification of significantly 
deviating patterns. In what follows we will choose 
zo = ±1.65 as threshold values for the aggregated 
Vn and An families and zo = ±2 for the subsets- 
specific corresponding ones (see the Supplementary 
Information for a justification of such values). Nat¬ 
urally, if the observations were exactly reproduced 
by our null model, the z-scores would be zero. 

The evolution of both similarity and ^-scores 
across the years in our database point out that the 
An family is better reproduced than the Vn family 
(showing a similarity and a z-score closer to zero - see 
panels 6a and 6b). In particular, Vn z-scores lie out¬ 
side the boundary of the significance region, show¬ 
ing values lower than —1.65. This indicates that for 
the binary, bipartite representation of the WTW, 
the degree sequence is far more effective in repro¬ 
ducing the products correlations than the correla¬ 
tions between countries. In other words, we correctly 
capture the countries tendency to expand their pro¬ 
duction, which seems to co-exist with a certain su¬ 
perposition of the countries baskets of products (see 
M-motifs in the Supplementary Information). How¬ 
ever, the BiCM overestimates the resemblance of 
the different baskets: as z-scores indicate, world- 
countries tend to form less V-motifs than expected 
under our null model (further confirmed by the trend 
of X-motifs and W-motifs - see the Supplementary 
Information). Summing up, world countries show a 
clear tendency to diversify their production, at the 


same time avoiding to directly compete on the same 
products. 

The comparison between similarity and z-score 
clarifies the role of average in characterizing the en¬ 
semble distribution of Vn and An families: the ratio 
s m/ z m < 0.1 justifies our interest in their ensemble 
average alone. 

However, z-scores of Vn and An families result 
in almost flat trends which allow us to draw only 
general conclusions on the WTW as a whole. The 
reason lies in the “aggregated” character of such mo¬ 
tifs, not distinguishing between different subsets of 
countries or products. To be more precise, let us 
consider the temporal evolution of our motifs on spe¬ 
cific subsets of nodes (see panels 6c and 6d): in par¬ 
ticular, the Asian Tigers (South Korea, Singapore, 
Taiwan, Hong Kong), the BRICS countries (Brazil, 
USSR/Russia, India, China, South Africa), the euro- 
pean countries belonging to G7 (France, Italy, Ger¬ 
many, United Kingdom) and a number of eastern- 
european countries (Hungary, Romania, Bulgaria, 
Poland, USSR/Russia) and let us calculate the tem¬ 
poral evolution of V4 and V5 motifs restricted to 
them. The european countries show a z-score almost 
constantly equal to 4, indicating a significant affinity 
which is maintained over time. An even stronger in¬ 
ternal affinity is shown by the Asian Tigers to which 
China should be added (in fact, its addition to the 
group rises the z-score). On the other hand, BRICS 
countries show a very limited affinity [8, 9, 47]: their 
trend becomes more and more consistent with the 
null model, to become negative in the recent years. 
The last two examples point out the limitations of 
the traditional economic classification (usually dis¬ 
tinguishing China from Asian Tigers and gathering 
BRICS together), not capturing any actual economic 
likeness. 

Eastern-european countries, on the other hand, 
show a strong correlation before 1989, gradually de¬ 
clining as this topical year approaches. Interestingly 
enough, after 1989 such correlation doesn’t disap¬ 
pear, remaining statistically significant (and stabi¬ 
lizing around z ~ 2): this seems to indicate a sig¬ 
nificant connection still persisting, having Russia 
replaced USSR as “reference” country. An addi¬ 
tional test is provided by the random choice of four 
countries (Ghana, China, Mozambique, Austria): al¬ 
though close to zero, their trend is constantly neg¬ 
ative. In fact, being Ghana and Mozambique low- 
diversification countries, they will be linked only to 
high-ubiquity products, common to all countries (see 
fig. 3): thus, their basket will be far more limited 
than China’s and Austria’s, limiting in turn their 
possibility to compete. The constantly negative sign 
indicates, in this case, the impossibility to compete. 

This kind of analysis can be repeated for An mo¬ 
tifs as well, allowing us to gain a substantial insight 
into the products correlations. Panels 6e and 6f show 
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FIG. 6. Analysis of motifs. Top panels: z-scores (panel a) and similarity (panel b) evolution across our database 
years of Nv (•), Nv 3 (•), Nv4 (•), Nv5 (•), N\ (•), N\3 (•), Na4 (•), Na5 (•). Middle panels: z-scores (panel c) and 
similarity (panel d) evolution of Vn-motifs, restricted to subsets of countries - Asian Tigers (•), Asian Tigers plus 
China (•), EU countries in G7 (•), BRICS (•), eastern countries (•), four randomly chosen countries (•). Bottom 
panels: z-scores (panel e) and similarity (panel f) evolution of An-motifs, restricted to subsets of products - “fruit 
and parts of plants”, “aluminium and aluminium alloys”, “road tractors” (•), “milk and cream”, “butter”, “cheese” 
(•), four randomly chosen products (•). Right column, panel f: similarity evolution across our database years of the 
same motifs. Our method correctly captures the countries tendency to expand their production (An-motifs), even if 
the resemblance of the different baskets of products is overestimated (Vn-motifs). Moreover, our method identifies 
statistically significative correlations among subsets of countries and products. 


some examples. While the food sector we have con¬ 
sidered shows a constantly high value of 2 , indicat¬ 
ing the common origin of the chosen dairy products, 
the pink trend signals a non-trivial positive corre¬ 
lation between the sectors represented by worked 


aluminium artifacts, tractors and fruit. A possible 
explanation may rest upon the consideration that 
tractors are constituted by parts in aluminium to be, 
in turn, used to transport the picked fruit. Consis¬ 
tently, the last group of products (cheese, rods and 
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year year 

FIG. 7. Analysis of the assortativity coefficient and nestedness, ^-scores (panel a) and similarity (panel b) evolution 
across our database years of r (•), NODF (•), nestedness along rows (•) and columns (•). While we are predicting a 
less disassortative network than observed, our method correctly reproduces the matrix nestedness. 


locomotives) is characterized by the value z ~ 0. 

Notice that while for some groups of nodes the first 
moment encloses great part of the relevant informa¬ 
tion (s m / z m < 0.5), for other groups higher-order 
moments could provide additional, useful informa¬ 
tion (s m /z m ~ 1), e.g. the distribution asymmetry. 
Interestingly, these circumstance are mostly encoun¬ 
tered for countries and products, respectively. 


Assortativity coefficient and nestedness 

As for the Vn and An motifs, the assortativity 
coefficient has a gaussian ensemble distribution (see 
the Supplementary Information). Both the observed 
value r and its z-score signal that we are globally 
overestimating the network assortativity: more ex¬ 
actly, since our expected coefficient (r) is still nega¬ 
tive, we are predicting a less disassortative network 
than observed (see fig. 7). This is a consequence of 
our randomization procedure, distributing links be¬ 
tween nodes more homogeneously (recall that, con¬ 
sistently, our predicted {d p n } p=1 and {u^ n }^ = 1 show 
less steeply decreasing trends than the observed ones 
- see fig. 3). 

In order to better understand the concept of nest¬ 
edness, let us explicitly draw a matrix from the 
BiCM-induced grandcanonical ensemble, ranking its 
rows and columns according to the F c and Q p mea¬ 
sures [8, 9]. The result is shown in fig. 7. Notice that 
nestedness cannot be simply reduced to the concept 
of “triangularity” of a matrix. In fact, even if the 
drawn matrix shows a more curved boundary than 
the observed one, both the nestedness ensemble dis¬ 
tribution (see the Supplementary Information) and 
its z-score (fig. 7) signal that our method reproduces 
it correctly. 


We have also measured the nestedness along rows 
and the nestedness along columns separately (ac¬ 
cording to the definitions in [41]). While the latter 
is reproduced and closely follows the trend of the 
global one, the former is, for a few years, signifi¬ 
cantly underestimated. This is non-trivially related 
to the way our null model redistributes V-motifs and 
A-motifs. However, as the bottom panel in fig. 8 
suggests, a role seems to be played by the asymme¬ 
try of our bipartite matrix as well: in other words, 
the higher cardinality of the products layer seems 
to induce a preferential filling of the rows, mak¬ 
ing them more homogenenous and lowering their ex¬ 
pected nestedness. 

It should be also noted that the ensemble coeffi¬ 
cient of variation for both r and NODF show such 
a small value (sm/zm cz 10 -2 for both, across our 
temporal dataset) that the ensemble average can be 
considered as the only moment carrying relevant in¬ 
formation. 


DISCUSSION 

In this paper we have both proposed a method 
to randomize binary, undirected, bipartite networks, 
by constraining essential network features as the to¬ 
tal number of links and the nodes connectivity, and 
tested it on a real system as the World Trade Web. 
While, on the one hand, specifying the degree se¬ 
quence allows highly non-trivial properties like coun¬ 
tries fitness, products complexity and the matrix 
nestedness to become reproduced across our whole 
dataset, on the other quantities like assortativity and 
motifs still elude a satisfactorily explanation. 

This is even more surprising, when considering 
the high level of accuracy achieved by the CM pre- 
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FIG. 8. Upper panel: the real World Trade Web matrix 
in the year 2000, with rows and columns in increasing or¬ 
der of fitness and complexity [8, 9]. Lower panel: matrix 
drawn from the BiCM-induced grandcanonical ensemble 
for the same year and ordered according to the same 
criterion. 


dictions in the analysis of the monopartite repre¬ 
sentation of the WTW. Our findings suggest that 
analysing different representations of the same net¬ 
work can indeed convey additional information, as 
proved by the agreement between the observed as- 
sortativity and the expected one (see fig. 3), lower 
than in the corresponding monopartite WTW [26]. 
In words, the correlations between countries induced 
by their productivity relations, clearly displayed by 
the bipartite representation of the WTW, are only 
partially explained by the degree sequence, calling 
for a higher amount of information to achieve the 
same level of accuracy obtained for the monopar¬ 
tite representation (and analogously for products). 
Otherwise stated, representing the same system via 
different network models (even belonging to the 
same class of binary, undirected configurations) may 
strongly affect the effectiveness of the corresponding 
piece of information (as the nodes connectivity) in 
reproducing the observed structure. 

Assortativity provides again the clearest example: 
as previously pointed out, the bipartite Configura¬ 
tion Model predicts trends quite similar to those ex¬ 
pected under the bipartite Random Graph. To bet¬ 
ter quantify this difference, we have calculated the 
Shannon entropy (normalized to the total number of 
nodes pairs, i.e. the network volume) of the prob¬ 
ability distributions induced by the BiRG and the 
BiCM: 



1965 1970 1975 1980 1985 1990 1995 2000 
year 

FIG. 9. Top panel: analysis of the degrees correlations 
on the projected WTW, in the year 2000, on the coun¬ 
tries layer (blue: observed trend; black: prediction un¬ 
der the CM; red: prediction under the RG). Middle 
panel: analysis of the degrees correlations on the pro¬ 
jected WTW, in the year 2000, on the products layer 
(blue: observed trend; black: prediction under the CM; 
red: prediction under the RG). Bottom panel: Shannon 
entropy of the uniform distribution (•), of the bipartite 
Random Graph model (•) and of the bipartite Config¬ 
uration Model (•) over the grandcanonical ensemble of 
binary, undirected, bipartite networks. 
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for the BiRG (see Supplementary Information). Re¬ 
sults are shown in the bottom panel of fig. 9. As 
evident from the trends, while specifying the total 
number of links strongly reduces the uncertainty (as 
signalled by the low value of the connectance, reduc¬ 
ing the ensemble entropy to half its maximum value), 
further specifying the degree sequence produces a 
less relevant effect one could expect on the basis of 
the well known, monopartite results [26]. Compar¬ 
ing the analyses of degree correlations for the bi¬ 
partite and the projected WTW (on both countries 
and products layers - top and middle panels of fig. 9 
for the year 2000), what emerges is quite impressive: 
while the CM prediction correctly overlaps to the ob¬ 
served trend, the RG predicts a flat trend completely 
missing the observed cloud of points (in line with the 
results already obtained for the monopartite repre¬ 
sentation [26]). In terms of Shannon entropy, when 
passing from the RG to the CM the reduction of un¬ 
certainty on the observed, projected WTW amounts 
to 41%; for the bipartite WTW, this percentage re¬ 
duces to only 16% (see fig. 9). This findings clearly 
indicate a future extension of our work: constraining 
those quantities having a significant impact on nodes 
correlations, as V-motifs, A-motifs or nestedness, in 
order to define a more effective null model. 

However, as the analysis of motifs reveals, the 
BiCM provides the right benchmark to highlight 
meaningful correlations between countries and prod¬ 
ucts, representing a purely topological alternative to 
the traditional economic classification, whose limita¬ 
tions have been already pointed out [8, 9, 49]. Re¬ 
markably, this kind of analysis can be repeated for 
different years, in order to monitor our system over 
time and detect significant temporal trends of the 
world economies co-evolution. 

We stress that our approach is grandcanonical and 
possible extensions of the method move in the same 
direction. The paper in [31], on the other hand, im¬ 
plements the microcanonical version of a mono-layer 
regular random graph: as for monopartite networks, 
comparing the performance of the two available ap¬ 
proaches represents a challenging, future research di¬ 
rection. 

Future work moves towards the direction of ex¬ 
tending the present framework to directed, as well 
as weighted, networks models, to test the robust¬ 
ness of our findings also for configurations beyond 
the binary, undirected ones. 


SUPPLEMENTARY INFORMATION 

The Random Graph model 

In the main text we have explicitly shown only 
the first and last passages of the calculations for the 
bipartite Configuration Model. The full passages are 


reported below: 
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calculations for the BiRG proceed along the same 
lines of those for the BiCM: 
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(with e e = x). Some more algebra leads to 
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(28) 


with x/(l + x) = p. Maximizing the network log- 
likelihood function leads to the result p = c(M) = 

L(M)/C-P. 


Topological measures for binary, undirected, 
bipartite networks 

Complexity and fitness. In order to infer the pro¬ 
ductive properties of the different countries from the 
biadjacency matrix M, in the context of Economic 
Complexity [7-9], the fitness and complexity algo¬ 
rithm has been proposed in [8]; roughly speaking, 
it is a generalization of the Google PageRank to bi¬ 
partite networks. The algorithm assigns high fitness 
to the countries exporting the most exclusive (i.e. 
with higher complexity) products. In particular, the 
fitness F c for country c and the complexity Q p for 
product p are defined, at the n-th iteration of the 











algorithm, as 


Expected topological measures for binary, 
undirected, bipartite networks 
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where the symbols (...) indicate the averages taken 
over the sets ' =1 and {Q^} p= i- The initial 

conditions can be chosen to be F® = Q® = 1, Vc, \/p. 
Further details on the convergence of the algorithm 
in (29) can be found in [45]. The non-linear be¬ 
haviour of fitness and complexity can be highlighted 
by respectively comparing the value of the diversi¬ 
fication d c (ubiquity u p ) with the ranking obtained 
through the fitness F c (complexity Q p ) values, as 
shown in fig. 3 of the Main Text. 

Nestedness. Several different definitions of nest¬ 
edness can be encountered in literature [41-44]. In 
the present article we use the definition called NODF 
(an acronym for Nestedness metric based on Over¬ 
lap and Decreasing Fill) and proposed in [41]. Let 
us define 


Assortativity. The expected value of the assorta- 
tivity coefficients is easily calculable, after noticing 
that ( d c ) = d c and ( u p ) = u p by construction, that 
m 2 cp = m cp , being m cp a binary variable, and resting 

upon the approximation (g) ~ fey: 


/ nn\ _ S p =lPcp( u p Pcp + 1) 

' “ d c 

/ jnn\ _ Yc=iP c p(^ c ~ Pep + 1) 

{P } ~ 

Assortativity standard deviation can be calcu¬ 
lated by applying the so-called delta method , whose 
generic formula reads 


(34) 

(35) 
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otherwise 


providing a method to calculate the standard devi¬ 
ation of any function of interest, V(M), in terms of 

(30) the independent random variables (in our case the 
entries m cp of the biadjacency matrix). 

Motifs. In addition to the Vn and An family we 
can define more complex motifs, enlarging the num¬ 
ber of nodes of the two layers to be considered. For 
example, X-motifs can be defined, i.e. combinations 

(31) of two V-motifs subtending the same pairs of coun¬ 
tries and products (see fig. 2 in the Main Text): 


Notice that S cc / ( T pp >) are solely determined by 
those pairs of countries (products) for which the 
number of ones in rows c and c' (in columns p and 
p') are different. The measure of nestedness called 
NODF is then defined as 


NODF = 2 


Ec<c' Sec' + Y p < p ' Tpp' 

C(C-1) + P(P-1) 


(32) 


The definition (32) results from summing the contri¬ 
bution coming from rows and from columns, being 
normalized to the total number of couples of rows 
and columns. In order to isolate the single contribu¬ 
tions coming from rows and columns, it is possible 
to defined the countries-specific and the products- 
specific NODF, respectively as 


NODF c = 2 


Yc<c' See' 

C[C- 1) ’ 


NODF p = 2 


Y p < p ' Tp P ' 

P(P-I) • 
(33) 
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(the 

notation 

Yjc<c' 

being equivalent 

to 

Z^ c = 1 

Z-^c'=c-\-l 

and similarly for products). 

As 


evident from the definition, X-motifs measure 
the co-occurrence of two countries as producers 
of the same pair of products and, viceversa, the 
co-occurrence of two products in the baskets of 
the same two countries. Thus, competitiveness 
on different segments of the market can now be 
measured, refining the information provided by 
V-motifs. 

Allowing for an even higher number of nodes to 
interact, M -motifs and W -motifs can be defined (see 
fig. 2 in the Main Text) as 
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A/ a 


FIG. 10. Ensemble distribution of Ny and N\ abundance (panels a and c), assortativity coefficient r and NODF 
(panels b and d) in the year 2000 (fits are obtained by superimposing normal distributions with the sample average 
and variance; red points represent the observed motifs abundance). 


N m (M) 


N W (M) 



respectively. As evident from the definition, coun¬ 
tries competitiveness is now measured on a larger 
number of products. 

Since all motifs are defined in terms of products of 
biadjacency matrix entries and the latter are treated 
as independent random variables by our null model, 
their expectation value can be computed exactly. 
Thus, we have 


c c P 

(Ny) = 'y^jPcpPc'pi 

C— 1 c'=c+l p= 1 

P P C 

W = ^ PcpPcp '? 

p= 1 p'=p-\- 1 C— 1 

C C P P 

(Nx) = ^ ^ ^ ^ ^ ^ y ^ PcpPcp'Pc'pPc'pA 

C— 1 c'=c+l p=l p'=p+l 

C C 

(V M ) = E E E PcpPcp'Pcp"Pc'pPc'p'Pc'p" 5 

C— 1 c' =c-\-l p<p' <p" 
p p 

(N w ) = E E E PcpPcp'Pc'pPc'p'Pc"pPc"p' • 

p=l p'=p+l c<c'<c" 

(39) 

However, when computing the expected value of 
the generalizations of the V-motifs and A-motifs (the 
Vn and An families), higher-order powers of the 
nodes’ degrees appear, since their definition reads 

Nvn = E£=i (“ P ) and N An = EcE (£)• In 
these cases, we can exploit the evidence that our 
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degrees can be considered (with a good approxi¬ 
mation) gaussian-distributed over the ensemble in¬ 
duced by the BiCM (for example, when considering 
Ny$ = u p (u p — 1 )(u p — 2), the well known re¬ 

sult stating that odd central moments of a gaussian 
distribution are zero can be used to greatly simplify 
the calculations). 

The motifs standard deviation can be calculated 
by applying the delta method, with X(M) = 
7V m (M). While valid in general, this formula as¬ 
sumes a particularly simple form for the Vn and An 
families of motifs. Expliciting it for a few cases will 
allow us to achieve a double goal: 1) providing a sim¬ 
ple expression for the z-scores of the corresponding 
motifs and 2) showing a limitation of the traditional 
definition of ^-scores. All the calculations will be 
carried on for the Vn family since they can be eas¬ 
ily generalized to the An family. Let us start by 
noticing that Vn motifs are functions of the prod¬ 
ucts degrees exclusively. Now, since in a bipartite 
network the nodes degrees within each layer are in¬ 
dependent random variables, specifying eq. 36 for 
the Vn family leads one to write 



further simplifiable using the binomial result 



with Hi = i } being the i-th harmonic number. 
Putting everything together, for n = 2 we have 


»v = £( U °)='t^f^-, (42) 

P= 1 ' ' P= 1 

now, (Ny) = [(Up) — up] /2. This allows us to 

calculate the difference between Ny and (Ny) sim¬ 
ply as the total number of links variance 

N v -(N v ) = - ^p^ =~A (43) 

where °l p = ( u p) ~ ( U P > 2 = Ec 0 "™^ = Ec-M 1 - 
p cp ). In order to calculate the standard deviation, 
we use eq. (40) to find 


2 ^ 
a N v ~ 


Ep=i( 2 “ P - i ) 2<j - 2 


(44) 


and finally obtain 
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—a 
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(45) 


The same procedure can be applied to all the mo¬ 
tifs belonging to the Vn and An families. More ex¬ 
plicitly, in the cases Ny% = Ylp =i { U £) an d Ay 4 = 
J2p=i { U l) fbe following results hold: 


z V 3 ~ 
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\/Ep=i( 3u p - 6w p + 


and 


(46) 
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VEp=i( 4 «p - 18Wp + 22 Up - 6 ) 2 al p 

We have also tested the agreement between the 
analytical expressions of the aforementioned ^-scores 
and the values obtained by explicitly sampling the 
grandcanonical ensemble induced by the CM. As fig. 
11 shows, despite the presence of two approxima¬ 
tions, our analytical estimates work quite satisfacto¬ 
rily. 

Figs. 12 and 13 show the analysis of the X, M 
and W-motifs. As for the other motifs previously 
considered, the three distributions follow a gaussian 
very closely, whose mean and variance have been 
calculated on the networks sample (5000 matrices). 
Again, this can be ascribed to the (generalized) Cen¬ 
tral Limit Theorem. As a general remark, the three, 
most complex motifs show higher fluctuations and 
are less accurately reproduced than the simpler ones 
(i.e. Vn and An). 

Beside having provided a simple expression for the 
^-scores of the Vn and An families of motifs, we have 
also shown that the latter may have a definite sign 
(negative, in our case): while quite surprising, in 
cases like this z-scores might still be used to test 
the agreement between observations and predictions 
but should be considered one-sided statistical tests 
of significance. This implies that the values enclos¬ 
ing the probabilities of 68%, 95% and 99% no more 
coincide with z = ±1, z = ±2 and z = ± 3, because 
no more computable on both tails of the reference 
gaussian distribution. The right 2 ) values for one¬ 
sided tests are ±1.65, enclosing a probability of 95%, 
and ±2.33, enclosing a probability of 99% [50]. The 
three more complex motifs (X-motifs, M-motifs and 
W-motifs) do not have a definite sign. 

The reason for the sign definiteness lies in the ex¬ 
plicit dependence of the quantities of interest from 
the chosen constraints, as shown below. Let us con¬ 
sider the Taylor expansion of a quantity of interest 
f(x) around the expected value (x): 
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z sam(^3) z sam(^3) 





FIG. 11. Comparison between the analytical expressions of the z-scores (on the y-axis) and the corresponding values 
obtained by explicitly sampling the grandcanonical ensemble induced by the BiCM (on the x-axis), for Nv, Nv 3 , 
Nv 4, Nv 5 (•) and Na 4, N\5 (•). 
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After calculating (/), the expression can be rewrit- 
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b) N x 



N w 


FIG. 12. Analysis of motifs. From top to bottom: en¬ 
semble distribution of Nx, Nm and Nw abundance in 
the year 2000 (fits are obtained by superimposing nor¬ 
mal distributions with the sample average and variance; 
red points represent the observed motifs abundance). 


ten as 


f(( x )) - ( f ( x )> = - 


&£ 

dx 2 


<*> 2 ! 


(49) 


Whenever x represents a given, chosen constraint 
whose expected value on the ensemble is, by defini- 




year 


FIG. 13. Analysis of motifs. From top to bottom: z- 
scores and similarity evolution across our database years 

of N m (•), N x (•), N w (•). 


tion, equal to the observed one, we obtain 


f{x) - (f(x)) 


dx 2 (x> 2 ! 


(50) 


Now, if higher-order moments can be ignored or 
the function is quadratic in x, the right hand side of 
the above equation is proportional to the numerator 
of the / function z-score, whose sign is negative. 
A simple example is provided by the function x 2 , 

with x = L: we obtain L 2 — (L 2 ) = (L) 2 — (L 2 ) = 

2 

—2^f = — cr 2 , as it should. In the V-motifs case 
/ (x) = /({tip}) = Ep^pK " 1 )/ 2 and Ny — (Ny) = 
— cr 2 p /2 = — cr 2 /2 and analogously for the A, V3 
and A3 cases (remembering, for the latter, that odd 
central moments of a gaussian distribution are zero). 

This finding has also an obvious interpretation in 
terms of grandcanonical and microcanonical ensem¬ 
bles. Given a certain set of constraints, the micro- 
canonical approach prescribes them to be exactly 
satisfied, implying that no statistical fluctuations 
of the latter can be observed [31]. On the other 
hand, the grandcanonical approach cannot reduce 










































21 


such fluctuations to zero (as also clearly shown by 
fig. 10) and the constraints variance will be positive: 
the dependence of a generic quantity of interest on 
it, according to the functional form shown before, is 
reflected in its sign definiteness. 
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